Picture for Zhou Zhao

Zhou Zhao

Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer

Add code
May 29, 2026
Viaarxiv icon

DynSess: Dynamic Session-Level Evaluation and Optimization Framework for Role-Playing Agents

Add code
May 28, 2026
Viaarxiv icon

DocRetriever: A Plug-and-Play Framework for Multimodal Document Retrieval with Comprehensive Benchmark

Add code
May 28, 2026
Viaarxiv icon

Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios

Add code
May 27, 2026
Viaarxiv icon

From Facts to Insights: A Persona-Driven Dual Memory Framework and Dataset for Role-Playing Agents

Add code
May 25, 2026
Viaarxiv icon

TMD-Bench: A Multi-Level Evaluation Paradigm for Music-Dance Co-Generation

Add code
May 03, 2026
Viaarxiv icon

Diffusion Model as a Generalist Segmentation Learner

Add code
Apr 27, 2026
Viaarxiv icon

Bridging the Pose-Semantic Gap: A Cascade Framework for Text-Based Person Anomaly Search

Add code
Apr 25, 2026
Viaarxiv icon

Dual-Axis Generative Reward Model Toward Semantic and Turn-taking Robustness in Interactive Spoken Dialogue Models

Add code
Apr 16, 2026
Viaarxiv icon

WavAlign: Enhancing Intelligence and Expressiveness in Spoken Dialogue Models via Adaptive Hybrid Post-Training

Add code
Apr 16, 2026
Viaarxiv icon